Feature Selection using Misclassification Counts

نویسندگان

  • Adil M. Bagirov
  • Andrew Yatsko
  • Andrew Stranieri
  • Herbert F. Jelinek
چکیده

Dimensionality reduction of the problem space through detection and removal of variables, contributing little or not at all to classification, is able to relieve the computational load and instance acquisition effort, considering all the data attributes accessed each time around. The approach to feature selection in this paper is based on the concept of coherent accumulation of data about class centers with respect to coordinates of informative features. Ranking is done on the degree to which different variables exhibit random characteristics. The results are being verified using the Nearest Neighbor classifier. This also helps to address the feature irrelevance and redundancy, what ranking does not immediately decide. Additionally, feature ranking methods from different independent sources are called in for the direct comparison.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

Probabilistic Token Selection via Fisher’s Method in Text Classification

In this project we consider a multiclass text classification problem on three newsgroups with 1,000 entries each with a feature class consisting of over 50,000 tokens. Our baseline Naive Bayes method gives a misclassification error rate of 4.51%, and we focus on variable selection methods to improve upon this error. We compare a token selection method using Naive Bayes to one using the related ...

متن کامل

The CASH algorithm-cost-sensitive attribute selection using histograms

Feature selection is an essential process for machine learning tasks since it improves generalization capabilities, and reduces run-time and amodel’s complexity. Inmany applications, the cost of collecting the features must be taken into account. To cope with the cost problem, we developed a new cost-sensitive fitness function based on histogram comparison. This function is integrated with a ge...

متن کامل

Second Order Cone Programming Formulations for Feature Selection

This paper addresses the issue of feature selection for linear classifiers given the moments of the class conditional densities. The problem is posed as finding a minimal set of features such that the resulting classifier has a low misclassification error. Using a bound on the misclassification error involving the mean and covariance of class conditional densities and minimizing an L1 norm as a...

متن کامل

Feature Selection Using Classifier in High Dimensional Data

Feature selection is frequently used as a pre-processing step to machine learning. It is a process of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. The central objective of this paper is to reduce the dimension of the data by finding a small set of important features which can give good classification performan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011